The cycle of observation, hypothesis formulation, experimentation, and falsification that has driven scientific and technical progress is lately becoming automated in all its separate components. However, integration between these automated components is lacking. Theories are not placed in the same formal context as the (coded) protocols that are supposed to test them: neither description knows about the other, although they both aim to describe the same process.
We develop integrated descriptions from which we can extract both the model of a phenomenon (for possibly automated mathematical analysis), and the steps carried out to test it (for automated execution by lab equipment). This is essential if we want to carry out automated model synthesis, falsification, and inference, by taking into account uncertainties in both the model structure and in the equipment tolerances that may jointly affect the results of experiments.
Molecular programming uses the computational power of DNA and other biomolecules to create nanoscale systems. Many of these will be safety-critical, such as bio-compatible diagnostic sensors and targeted drug-delivery devices. Design of these programmed molecular systems needs to assure safe outcomes from very many, very small, fault-prone components operating simultaneously in a dynamic environment. Some of this can be achieved by adapting existing software engineering methods, but molecular programming also presents new challenges that will require new methods. This talk discusses an example of such a challenge, describes how we design safety-critical building blocks, and presents work in progress to ascertain how robust a molecular program is. Similar approaches also will help design safe non-molecular systems with highly distributed, autonomous, fault-prone components operating in dynamic environments.
While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic modulo theories (TSL-MT), a framework that unites the two approaches to synthesize a single program. In our approach, reactive synthesis and SyGuS collaborate in the synthesis process, and generate executable code that implements both reactive and data-level properties. We also present a tool, temos, that combines state-of-the-art methods in reactive synthesis and SyGuS to synthesize programs from TSL-MT specifications. We demonstrate the applicability of our approach over a set of benchmarks, and present a deep case study on synthesizing a music keyboard synthesizer.
Databases are typically faster in processing huge amounts of data than applications with hand-coded data access. Even though modern dynamic runtimes optimize applications intensively, they cannot perform certain optimizations that are traditionally used by database systems as they lack the required information. Thus, we propose to extend the capabilities of dynamic runtimes to allow them to collect fine-grained information of the processed data at run time and use it to perform database-like optimizations. By doing so, we want to enable dynamic runtimes to significantly boost the performance of data-processing workloads. Ideally, applications should be as fast as databases in data-processing workloads. To show the feasibility of our approach, we are implementing it in a polyglot dynamic runtime.
Dynamic compilers collect a variety of information to optimize programs and achieve peak performance. Nevertheless, particularly in data-heavy applications, analysis of the processed data - data structures, metrics, relations - could enable additional optimizations in terms of access patterns and data locality. Query planning in database systems is one source of inspiration, but due to the required overhead to collect such information, it is infeasible in dynamic compilers. With this project, we propose integrating data analysis into a dynamic runtime to speed up big data applications. The goal is to use the detailed run-time information for speculative compiler optimizations based on the shape and complexion of the data to improve performance.
Modern web applications are continuously evolving and becoming increasingly reliant on web frameworks to support their ever-changing needs. This necessitates the realization of efficient static analysis methodologies for the purpose of bug finding and security auditing of such applications. Moreover, the majority of these frameworks are written in JavaScript, which is difficult to analyze due to its extremely dynamic nature. The primary goal of this work is to study the effectiveness of the present state-of-the-art call graph approaches for JavaScript and propose techniques to enhance them such that they discover more of the crucial functions and call edges in modern, framework-based JavaScript applications. Ideally, these new techniques must enhance function and call edge discovery without much impact on precision and scalability.
State-of-the-art managed runtimes apply aggressive optimizations speculating that programs have low variability. However, literature shows that program behavior may evolve at run time and experience different execution phases. This variable behavior may hide optimization opportunities. Taking phases into account may thus improve performance when applied to phase-sensitive optimizations such as lookup caches, which can be monomorphization bottlenecks when they contain phase-specific entries.
In this project, we introduce phase-based splitting, an optimization that utilizes phases to guide monomorphization based on splitting. Preliminary results on a first prototype and micro-benchmarks show promising speedups ranging from 10 to 20% on average, peaking at 47.6% per phase. In our next steps, we will evaluate our approach on a richer set of benchmarks and real-world applications, and define heuristics to better guide phase-based splitting.
Program synthesis has seen many new applications in recent years, in large part thanks to the introduction of SyGuS. However, no existing SyGuS solvers have support for synthesizing recursive functions. We introduce an multi-phase algorithm for the synthesis of recursive "looplike" programs in SyGuS for programming-by-example. We solve constraints individually and treat them as "unrolled" examples of how a recursive program would behave, and solve for the generalized recursive solution. Our approach is modular and supports any SyGuS Solver.
Building connections between mathematical expressions and their visual representations increases conceptual understanding and flexibility. However, students rarely practice visualizing abstract mathematical relationships because developing diagrammatic problems is challenging, especially at scale. To address this issue, we introduce Edgeworth, a system that automatically generates correct and incorrect diagrams for a given question prompt. It does so by mutating declarative mathematical statements with visual semantics. We evaluated the system by recreating diagrammatic problems in a widely used geometry textbook.
While decidability of type systems may be a reasonable sacrifice to make in exchange for greatly increased expressivity in some cases, having a type checker that is guaranteed to terminate while maintaining a high degree of expressivity can be beneficial. Over the last few years there have been several papers and theses written describing decidable variants of DOT, however, these variants all present similar challenges in limitations on expressivity. We aim to increase the expressivity of these variants without sacrificing decidability.
Static verification is used to ensure the correctness of programs. While useful in critical applications, the high overhead associated with writing specifications limits its general applicability. Similarly, the run-time costs introduced by dynamic verification limit its practicality. Gradual verification validates partially specified code statically where possible and dynamically where necessary. As a result, software developers gain granular control over the trade-offs between static and dynamic verification. This paper contains an end-to-end presentation of gradual verification in action, with a focus on applying it to C0 (a safe subset of C) and implementing the required dynamic verification.
The problem of source code authorship attribution is crucial for a few reasons. Security and legal issues are the most popular ones. However, this domain could also help to understand the nature of the personal code style. This type of information could be used, for instance, by IDEs to improve the developer's experience of writing the code. The goal of this study is to construct an interpretable model for source code embeddings generation. Such embeddings should represent the correspondence between the source code and its author.